In the 1st part of the Linux processes series of articles, we build up the understanding on Linux processes by discussing about the main() function, and environment related C functions.
In this article, we will discuss about the memory layout of a process and the process terminating C functions.
Linux Processes Series: part 1, part 2 (this article), part 3
Memory Layout of a Process
The memory layout of a process in Linux can be very complicated if we try to present and describe everything in detail. So, here we will present only the stuff that has significant importance.
If we try to visualize the memory layout of a process, we have something like this:
Lets explain each component of the above layout one by one :
- The command line arguments and the environment variables are stored at the top of the process memory layout at the higher addresses.
- Then comes the stack segment. This is the memory area which is used by the process to store the local variables of function and other information that is saved every time a function is called. This other information includes the return address ie the address from where the function was called, some information on the callers environment like its machine registers etc are stored on stack. Also worth mentioning here is that each time a recursive function is called a new stack frame is generated so that the each set of local variables does not interfere with the any other set.
- Heap segment is the one which is used for dynamic memory allocation. This segment is not limited to a single process, instead it is shared among all the processes running in the system. Any process could dynamically allocate memory from this segment. Since this segment is shared across the processes so memory from this segment should be used cautiously and should be deallocated as soon as the process is done using that memory.
- As seems from the figure above, the stack grows downwards while the heap grows upwards.
- All the global variable which are not initialized in the program are stored in the BSS segment. Upon execution, all the uninitialized global variables are initialized with the value zero. Note that BSS stands for ‘Block Started by Symbol’.
- All the initialized global variables are stored in the data segment.
- Finally, the text segment is the memory area that contains the machine instructions that CPU executes. Usually, this segment is shared across different instances of the same program being executed. Since there is no point of changing the CPU instructions so this segment has read-only privileges.
Please note that the above figure is just a logical representation of the memory layout. There is no guarantee that on a given system a memory layout of a process would look like this. Also besides these, several other segments for symbol table, debugging information etc exist.
Process Terminating Functions exit() and _exit()
The following functions can cause a process to terminate :
- exit(status) (same as return status)
- _exit(status) or _Exit(status)
The difference between the exit() function and the _exit() functions us that the former does support some clean-up before giving the control back to the kernel while the other two functions return to the kernel immediately.
The functions _exit is specified by POSIX while _Exit is specified by ISO C. Apart from this, there is no other major difference between the two.
As already discussed above, the cleanup is the major difference between the exit() and _exit(). Before proving this practically, lets understand another function ‘atexit()’.
Following is the prototype :
int atexit(void (*function)(void));
As the name suggests, this is a system call that takes a function pointer and registers that particular function as a cleanup function for this program. This means that the registered function gets called whenever a process terminates normally and the process termination supports cleanup.
If you go through the last line of the above paragraph once again, you will see that the function ‘atexit’ is a part of cleanup process that differentiates between the exit() and _exit() functions. So, here is a code that uses atexit() and exit() functions..
#include<stdio.h> #include<stdlib.h> #include<unistd.h> extern char **environ; void exitfunc(void) { printf("\n Clean-up function called\n"); } int main(int argc, char *argv[]) { int count = 0; atexit(exitfunc); printf("\n"); while(environ[count++] != NULL) { // Dos some stuff } exit(0); }
In the code above, the function ‘exitfunc()’ is registered to kernel as a cleanup function by using the function atexit().
When the above code is run :
$ ./environ Clean-up function called
We see that the clean-up function was called.
IF we change the call from exit() in the above code to _exit() :
#include<stdio.h> #include<stdlib.h> #include<unistd.h> extern char **environ; void exitfunc(void) { printf("\n Clean-up function called\n"); } int main(int argc, char *argv[]) { int count = 0; atexit(exitfunc); printf("\n"); while(environ[count++] != NULL) { // Dos some stuff } _exit(0); }
If we run this program, we see :
$ ./environ $
So we see that this time the cleanup function ‘exitfunc()’ was not called, which shows the difference between exit() and _exit() functions.
Comments on this entry are closed.
Hi,
This is a very good article. Linux gives a ‘size’ utility which list the size of each section of memory layout of any executable, as shown here.
i.e run as,
$ size ./environ
“Usually, this segment is shared across different instances of the same program being executed.”
Could you elaborate on this a bit?
I am confused as to why a program segment should be scared across different programs.
@ABhatia
Take example of a text editor. One can open many instances of a text editor but there is no point of loading text segment (corresponding to the text editor) again and again. So, the same text segment is shared across all the instances of text editor.
This article is crystal-clear explained, going right to the point with examples.
I like that..!
it is useful ….
The heap is NOT shared across all of the processes on the system. I suspect the author needs to read about paging.
Mike is correct, they are NOT shared.
Thank you for the article, it is pretty clear!
Just 1 correction: The heap is shared among threads that belong to the same process. It is not shared among different processes.
Each process has its own virtual address space which includes all the listed segments: stack, heap, BSS etc. Hence, it’s impossible to share heap between 2 or more processes.
Meanwhile, threads created by the same process run within the same address space. The don’t share stack (as otherwise it would be impossible to run multiple threads) but they do share heap. That’s the reason we have mutexes and other thread synchronisation tools.